I/O Throttling and Coordination for MapReduce

نویسندگان

Siyuan Ma

Xian-He Sun

Ioan Raicu

چکیده

As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drops and task delays caused by I/O stream congestion in the MapReduce framework. In this paper, we propose two techniques to address the I/O stream congestion in MapReduce tasks. First, I/O stream throttling is presented to limit the number of concurrent I/O streams, and avoid throughput drops. Furthermore, to alleviate the I/O contention among multiple MapReduce jobs, I/O coordination orders the I/O streams in accordance to job priority. By exclusively granting I/O resources to streams with higher priorities, the coordination effectively shortens the average job completion time. Experimental results from Hadoop confirm that the proposed techniques improve the average job completion time by up to 33.74%. In addition, the proposed techniques greatly accelerate the execution of high priority jobs; thereby, showing it is capable of fostering QoS in the MapReduce framework. KeywordsI/O stream; MapReduce; I/O scheduling; throttling; coordination

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ThemisMR: An I/O-Efficient MapReduce

“Big Data” computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. In this work, we present ThemisMR, a MapReduce implementation that reads and writes data records to disk exactly twice, which is the minimum amou...

متن کامل

Throttling I/O Streams to Accelerate File-IO Performance

To increase the scale and performance of scientific applications, scientists commonly distribute computation over multiple processors. Often without realizing it, file I/O is parallelized with the computation. An implication of this I/O parallelization is that multiple compute tasks are likely to concurrently access the I/O nodes of an HPC system. When a large number of I/O streams concurrently...

متن کامل

Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model

In Cloud storage of multiple CPU cores, many Mapreduce applications may run in parallel on each compute node and collocate with local Disks storage. These Disks storage are shared by multiple applications that use full CPU power of the node. Each application tends to issue contiguous I/O requests in parallel to the same Disk; however if large number of Mapreduce tasks enters the I/O phase at th...

متن کامل

Thermal Attacks on Storage Systems

Disk drives are a performance bottleneck for data-intensive applications. Drive manufacturers have continued to increase the rotational speeds to meet performance requirements, but the faster drives consume more power and run hotter. Future drives will soon be operating at temperatures that threaten drive reliability. One strategy that has been proposed for increasing drive performance without ...

متن کامل

The Efficiency of MapReduce in Parallel External Memory

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only little work has been done yet to put MapReduce on a par with the major computational models. Following pioneer work that relates the MapReduce framework with ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

I/O Throttling and Coordination for MapReduce

نویسندگان

چکیده

منابع مشابه

ThemisMR: An I/O-Efficient MapReduce

Throttling I/O Streams to Accelerate File-IO Performance

Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model

Thermal Attacks on Storage Systems

The Efficiency of MapReduce in Parallel External Memory

عنوان ژورنال:

اشتراک گذاری